Skip to content

Conversation

@GiovaGa
Copy link
Collaborator

@GiovaGa GiovaGa commented Oct 30, 2025

For now the branch contains just a test that reproduces the bug

Solves #397

@GiovaGa GiovaGa linked an issue Oct 30, 2025 that may be closed by this pull request
@GiovaGa GiovaGa marked this pull request as draft October 30, 2025 11:34
@anyzelman anyzelman added the bug Something isn't working label Nov 6, 2025
@anyzelman anyzelman added this to the v0.8 milestone Nov 6, 2025
@GiovaGa GiovaGa marked this pull request as ready for review November 11, 2025 09:10
@GiovaGa GiovaGa requested a review from anyzelman November 11, 2025 09:11
@GiovaGa
Copy link
Collaborator Author

GiovaGa commented Nov 11, 2025

Tests seem to pass without issue (we'll do full test suite after review).
This fix may depend on some assumptions about the number of nonzeros of the two vectors and the parameters supplied.
It may be good to add some assertions and/or comments about the assumptions made, but I am not sure what are the right things to check

@anyzelman
Copy link
Member

Hi @GiovaGa , I had a quick look -- assuming the fix is correct (haven't dived deep yet), isn't this line here also in error?

https://github.com/Algebraic-Programming/ALP/pull/398/files#diff-7936a0a79fe58f59fa9da39efc2a7c15f7ae9e49710647c69ba507e51cfa8e3dR10464

@GiovaGa
Copy link
Collaborator Author

GiovaGa commented Nov 25, 2025

The line you mention surely looks wrong, because it is the same index as the one below. But it may not be strictly related to the other fix

@GiovaGa
Copy link
Collaborator Author

GiovaGa commented Nov 25, 2025

I have now had a look again at this code, and I propose a new fix that is more convincing to me. I also fixed the line that you indicated. I will run tests to check everything works as expected

@GiovaGa
Copy link
Collaborator Author

GiovaGa commented Nov 25, 2025

And in fact tests failed. I will look into it

@GiovaGa
Copy link
Collaborator Author

GiovaGa commented Nov 25, 2025

After this fix tests pass on internal CI, except for dot_debug_hyperdags and dot_ndebug_hyperdags, I will check locally

@GiovaGa
Copy link
Collaborator Author

GiovaGa commented Nov 26, 2025

Locally test pass. @anyzelman I think you can check that the change makes sense
You can find my rationale in a comment of the issue

@anyzelman
Copy link
Member

Note to self: ready for review

@anyzelman anyzelman force-pushed the 397_dot_value_nonblocking branch from b9051e9 to 9bd9f93 Compare December 18, 2025 10:32
…from the left and right inputs). I extended the unit tests to try trip up the new implementation, but I appear to have not been able to construct a test that does-- oddly enough. Nevertheless, will keep the extended unit test
… nonzeroes of x, since that is the vector with guaranteed the fewest number of nonzeroes. Instead, the error was in the computation of the mask -- it used y instead of x. Special note that the other fix WAS correct-- so there were two bugs in total. Also a minor code style fix
@anyzelman
Copy link
Member

Hi @GiovaGa -- could you check my latest commit? I think the issue was half-resolved, and should now be fully resolved. Can also discuss in person, perhaps easier

@anyzelman
Copy link
Member

All tests succeed (with LPF). CI running-- if pass, will merge. Concept release notes:

The implementation of the dot-product for the nonblocking backend could iterate over different nonzero indices in its input vectors. This bug did not materialise always, and did not materialise if the dot-product was called with the dense descriptor. This MR extends the unit test of the dot product to test more challenging sparse nonzero patterns, especially patterns that differ amongst the inputs, and add tests with two vectors with partial overlap as well as with zero overlap. The thus-extended unit test was able to trigger the pre-existing bug, which this MR also addresses. In addressing the bug, furthermore, issue #408 was uncovered.

Thanks to @GiovaGa for finding the bug, and for proposing an initial fix and unit test extension!

@anyzelman anyzelman merged commit adfdd1e into develop Dec 23, 2025
2 checks passed
@GiovaGa
Copy link
Collaborator Author

GiovaGa commented Jan 7, 2026

@anyzelman Smoke test for simulated annealing-Replica Exchange (#378) fails with nonblocking backend (see Actions )
Running locally I see that the following assertion fails:

include/graphblas/nonblocking/blas1.hpp:10438: grb::RC grb::internal::sparse_dot_generic(typename AddMonoid::D3&, size_t, size_t, const Coords&, const Coords&, const grb::Vector<MaskType, grb::nonblocking, Coords>&, const grb::Vector<MaskType, grb::nonblocking, Coords>&, size_t, const AddMonoid&, const AnyOp&) [with unsigned int descr = 0; bool already_dense_input_x = false; bool already_dense_input_y = true; AddMonoid = grb::Monoid<grb::operators::add<float>, grb::identities::zero>; AnyOp = grb::operators::mul<float>; InputType1 = float; InputType2 = float; Coords = grb::internal::Coordinates<grb::nonblocking>; typename AddMonoid::D3 = float; size_t = long unsigned int]: Assertion `local_x.nonzeroes() <= local_y.nonzeroes()' failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Different behavior of dot with nonblocking backend vs reference

3 participants